Automatic purpose-driven basis set truncation for time-dependent Hartree–Fock and density-functional theory

Real-time time-dependent density-functional theory (RT-TDDFT) and linear response time-dependent density-functional theory (LR-TDDFT) are two important approaches to simulate electronic spectra. However, the basis sets used in such calculations are usually the ones designed mainly for electronic ground state calculations. In this work, we propose a systematic and robust scheme to truncate the atomic orbital (AO) basis set employed in TDDFT and TD Hartree–Fock (TDHF) calculations. The truncated bases are tested for both LR- and RT-TDDFT as well as RT-TDHF approaches, and provide an acceleration up to an order of magnitude while the shifts of excitation energies of interest are generally within 0.2 eV. The procedure only requires one extra RT calculation with 1% of the total propagation time and a simple modification on basis set file, which allows an instant application in any quantum chemistry package supporting RT-/LR-TDDFT calculations. Aside from the reduced computational effort, this approach also offers valuable insight into the effect of different basis functions on computed electronic excitations and further ideas on the design of basis sets for special purposes.


Ruocheng Han 1 , Johann Mattiat 1 & Sandra Luber 1
Real-time time-dependent density-functional theory (RT-TDDFT) and linear response time-dependent density-functional theory (LR-TDDFT) are two important approaches to simulate electronic spectra. However, the basis sets used in such calculations are usually the ones designed mainly for electronic ground state calculations. In this work, we propose a systematic and robust scheme to truncate the atomic orbital (AO) basis set employed in TDDFT and TD Hartree-Fock (TDHF) calculations. The truncated bases are tested for both LR-and RT-TDDFT as well as RT-TDHF approaches, and provide an acceleration up to an order of magnitude while the shifts of excitation energies of interest are generally within 0.2 eV. The procedure only requires one extra RT calculation with 1% of the total propagation time and a simple modification on basis set file, which allows an instant application in any quantum chemistry package supporting RT-/LR-TDDFT calculations. Aside from the reduced computational effort, this approach also offers valuable insight into the effect of different basis functions on computed electronic excitations and further ideas on the design of basis sets for special purposes.
Electronically excited states and their properties are among the central topics of quantum chemistry research. The utilized theoretical methods for excited state calculations typically require equivalent or higher computational resources compared to analogous ground state calculations. Highly accurate multiconfigurational methods are computationally demanding and thus can only be applied to small systems.
Time-dependent density-functional theory (TDDFT), due to its good compromise between accuracy and efficiency, has been employed in a wide range of applications, especially for spectroscopy [1][2][3] . Real-time propagation (RTP) has become an appealing technique for the solution of the time-dependent Kohn-Sham calculations, namely, real-time time-dependent density-functional theory (RT-TDDFT), or general approximations to the timedependent Schrödinger equation [4][5][6][7][8][9] . It is based on the evolution of molecular orbitals under the influence of an external field, often with only a δ-pulse (field) applied at the beginning. In the weak field limit within the adiabatic approximation, the spectroscopy simulations using RT-TDDFT and linear response time-dependent density-functional theory (LR-TDDFT) should provide comparable results 4 . During each time step, one needs to construct the Hamiltonian given by the new molecular orbital (MO) coefficients, which is the most timeconsuming part of the RTP. Depending on the quantum chemistry method employed in RTP, the construction of the Hamiltonian may scale to OðN 4 Þ, e.g., for HF Coulomb and exchange matrices calculated with 2-electron integrals, where N refers to the number of AO basis functions. Therefore, reducing the number of basis functions or finding a proper smaller basis set can potentially save a large amount of computational time and memory.
Previous studies on the topic of basis set truncation/reduction have mainly followed three strategies: (1) decreasing the size of the virtual space for frozen natural orbital approximations used in perturbation based methods 10,11 (e.g., Møller-Plesset perturbation theory, coupled cluster single-double and perturbative triple, complete active space perturbation theory), (2) reducing the number of functions in correlation consistent basis sets 12,13 , (3) reducing the number of basis functions of subsystems (which apply expensive wavefunction-based methods) for embedding calculations 14,15 . However, these works focus on the electronic ground state. Multiple embedding techniques have been applied to accelerate RT-TDDFT calculations by treating subsystems with different level of theories [16][17][18][19][20] . The idea of a decomposition of the electric dipole moment into molecular orbital pairs was also proposed in recent works for the acceleration or analyses of spectra [21][22][23] . This work explores the contribution of a fundamental ingredient-basis functions-to the electronic spectra. The truncation of basis functions proposed in this work is designed to check every single component in the basis set (basis function). One can also apply a shell level truncation for general applications. Basis set files can be easily modified for an accelerated simulation of the spectrum and to obtain a better chemical insight into the electric dipole moments contribution. Moreover, a routine to construct complete basis set (CBS) for TDDFT calculations is proposed. The calculations of electronic absorption and ECD spectra in this work take place in a linear response framework within the electric dipole approximation, assuming that the excited states of the system can be well described within the occupied-virtual space spanned by the ground state solution of the system. As standardly done, we assume the adiabatic approximation, discarding the dependence of the exchangecorrelation functional on the history of the propagation. The decomposition of electric dipole moments into the contribution of individual AO basis functions and checking the variation of molecular orbitals (in component of basis functions) during the RTP provides a quantitative evaluation of each AO basis function based on its importance for the electronic spectra under study. This further paves the way for a truncation process on the basis set for the computational speed-up and a way to generate complete basis set for TDDFT calculations.
In this work, we propose a basis set truncation scheme for TDDFT calculations. The method is tested for small molecules up to a highly conjugated system and a metal cluster, and achieves an acceleration up to an order of magnitude in RT-TDDFT or LR-TDDFT calculations with negligible change in the region of interests (e.g., valence-shell transitions) of the computed spectra.

Electric dipole moment
In the context of this work, the electronic part of the electric dipole moment d ! is defined as the trace of the product of the density matrix and integrals of the electric dipole moment operator Àe r ! ( r ! = ðx, y, zÞ, e is elementary charge) in the AO basis with basis functions fχ μ g. For calculating the time-dependent electric dipole moment d ! ðtÞ, we use the AO basis representation for both density matrix P AO (t) and the electric dipole moment integrals D ! as shown in Eq. (1). In this way, only the density matrix P AO (t) is timedependent and D ! remains the same during the RTP for fixed nuclei. P AO (t) can be further expressed in molecular orbital (MO) basis as P MO (MO density matrix after SCF, see Eq. (3) where f i is the occupation number of the ith MO) and the time-dependent part is only carried by the MO coefficients C(t) and its complex conjugate C † (t) (see Eq. (2)). In this work, the AO basis functions are all Gaussiantype orbitals.
P AO ðtÞ = CðtÞP MO C y ðtÞ ð2Þ Real-time propagation In our implementation, the MO coefficients C(t) are propagated for a small timestep Δt (see Eq. (4)) using the "enforced time-reversal symmetry" (ETRS) 24 scheme. U(t + Δt) represents the propagator at time t + Δt and is calculated with Eq. (5), where S is the overlap matrix in AO basis and F(t) is the Fock matrix or Kohn-Sham (KS) matrix in AO basis at time t. C(t + Δt), U(t + Δt), and F(t + Δt) are computed selfconsistently 24 .
Cðt + ΔtÞ = Uðt + ΔtÞCðtÞ ð 4Þ F(t) needs to be constructed for each time step and usually contributes most to the computational time in RTP. For example, the elements of the HF exchange matrix K μν (t) are given in Eq. (6) ( μλ|σν are the two-electron repulsion integral (ERIs) in AO basis expressed in Eq. (7)), and the computation of the exchange matrix K(t), which is required for the construction of F(t), scales as OðN AO 4 Þ (N AO is the number of AO basis functions).

AO basis truncation
In order to decrease N AO , we first analyse Eq. (1) for the electric dipole contribution from each AO basis function. For the sake of simplicity, O ! μ ðtÞ is used to represent the μth diagonal element of P AO ðtÞ D ! , and thus d ! ðtÞ can then be rewritten as in Eq. (8). Taking a detailed look at the construction of O ! μ ðtÞ in Eq. (9), one can find that it provides a decomposed form of electric dipole moments of each basis function. Therefore, we use O ! μ ðtÞ to represent the electric dipole contribution from the μth basis function.
In hχ μ | r ! |χ ν i, the relative position of atoms can cause different values of elements in the matrix, which we would like to avoid. To explain the reason, we can think about a toy system consisting of only two hydrogen atoms with Cartesian coordinates H1 r ! 1 = (0, 0, -a) and H2 r ! 2 = (0, 0, a) where a ≠ 0. It is obvious that e.g., diagonal matrix elements hχ H1 s | r ! |χ H1 s i ≠ hχ H2 s | r ! |χ H2 s i (see Eq. (11)) even though, by symmetry, we expect the same "contribution" of electric dipole from the two atoms. In Eq. (11), each H atom has a Slater-type 1s orbital Ae −ζr where A is normalization constant and r is the distance from the center of the atom, and we change the integration variable from r ! to s ! using One way of minimizing the effect of hχ μ | r ! |χ ν i is to shift the molecular system far from (0, 0, 0), which is equivalent to set a large R !
. It is worth noting that we do not need to formally "move" the molecule, and this is just an assumption made in the derivation from Eq. (9) to Eq. (12). In this method, we only care about the relative value of O ! μ ðtÞ when determining the basis function(s) to be truncated, and R ! provides the same factor to D ! μν and later O ! μ ðtÞ. Therefore, it is safe to substitute D ! μν with S μν in the expression of O ! μ ðtÞ, and thus we have a scalar O μ (t) as shown in Eq. (12). It is worth noting that O μ (t) does not explicitly contain any electric dipole information, which makes sense because the electric dipole cannot be formally defined on a single atomic centered orbital. However, AOs do contribute to the electric dipole by forming MOs across different atomic centers, and the information of such contribution, which we call density matrix contribution, is contained in P AO (t). The expression in Eq. (12) is also known in Mulliken population analysis, as the number of electrons associated with χ μ . The following computations are all based on the scalar form O μ (t).
To quantify the contribution, we use the formula in Eq. (13). x DC μ is an indicator measuring the variation of the Density matrix Contribution (DC) of the μth AO basis function. S t [O μ (t)] computes the standard deviation of O μ (t) for the total simulation time for each μ. The numerator of Eq. (13) indicates variation (along the RTP) of the electric dipole moment contribution from each AO basis function. The dimensionless quantity x DC μ is then constructed by dividing the numerator with its mean value of all AO basis functions. A small x DC μ value means the change of electric dipole moments contributed from the μth basis function is comparatively small among all AO basis functions, and removing this basis function should not change the spectrum (a constant value vanishes after Fourier transform for RT-TDHF/TDDFT) significantly. One might point out that x DC μ cannot distinguish the pulse from different directions because O μ (t) in Eq. (12) is no longer direction dependent like in Eq. (8). However, it is found that P AO (t) still varies according to the direction of the pulse because its action is coded in the MO coefficients.
It is worth noting that we have considered applying basis transformation regarding to P AO ðtÞ D ! , namely using the eigenvectors of P AO (t) (transform to natural orbital basis) or D ! . However, the former one is time-dependent so it is hard to choose a transformation matrix for all time steps, and the latter one is reference point-dependent (see Eq. (10)) and one cannot obtain the consistent truncation choice under translation (note that nuclei do not move in this study as opposed to, e.g., Ehrenfest dynamics). Also, an AO basis is a common choice in most molecular simulations and some solid state simulations (e.g., Gaussian and Plane Waves 25 method in CP2K package (CP2K version 7.0 (Development Version), the CP2K developers group. CP2K is freely available from https://www.cp2k.org/.)). Therefore, the truncation on the AO basis has broad application prospects and can be easily applied by a simple modification of basis set file.
During practical tests of basis truncation, we observed that using only x DC μ as indicator is not enough for obtaining an accurate spectrum. Another indicator x IP μ is introduced which measures the Importance of Propagation stability (IP) of the μth AO basis function (see Eq. (14)). C μj (t) denotes an element in transformation matrix (from AO to MO basis), S t computes the standard deviation along the time for each μ and j, and P N MO j sums over all standard deviations in MOs originating from the μth AO basis function. The numerator of Eq. (14) indicates the variation (along RTP) of the contribution from each AO basis function to all MOs in transformation matrix C(t). As for x DC μ , the dimensionless quantity x IP μ is also constructed by dividing the numerator with its mean value of all AO basis functions. Small x IP μ value means that the contributions to MOs from the μth basis function do not change much compared to the contribution of all AO basis functions, and removing this basis function should not affect the propagation of the density matrix (remaining part) significantly. Note that both O μ (t) and C μj (t) are usually complex numbers for RTP, and the standard deviation of a set of complex numbers is calculated as in Supplementary Eq. (1).
In practice, an empirical parameter x thr is chosen as threshold for both x DC μ and x IP μ , where the AO basis functions with both indicators smaller than x thr can be removed. The remaining basis set fχ μ g trunc (truncated AO basis set) is then defined as in Eq. (15), given the original AO basis set {χ μ } (of which the cardinality jfχ μ gj is N AO ). Sometimes fχ μ g trunc includes only part of the given shell, e.g., for a p-shell, only χ p x and χ p y are in fχ μ g trunc and χ p z is truncated. Such symmetry breaking is mainly due to the utilization of polarized field (δ-pulse) in the RT-TDDFT calculations, and the rotational invariance requires a shell level truncation. Considering the truncated basis set used in any computational chemistry package, we also recommend shell level truncation for the general application of the basis set file. In most cases, the majority rule can be applied for a truncation at the shell level, namely, the shells containing more than half of their original basis functions remain in fχ μ g trunc while others are fully discarded. This has to be checked with the x DC μ indicators of the basis functions in the same shell to ensure that there are no strong contribution to electric dipole transitions arising from some basis functions. In this study, the basis sets of (S)-methyloxirane, (-)-α-pinene, ZnPc, and Ag 20 systems are truncated at the shell level.
The schematic view of the truncation process is shown in Fig. 1.
Using fχ μ g trunc , namely, reducing number of basis functions from N AO to N trunc = |fχ μ g trunc |, can ideally decrease the total computational time to ðN trunc =N AO Þ 4 for a RT-TDHF calculation or a RT-TDDFT calculation with hybrid exchange-correlation functional. Also, this truncated basis set can be transferred to LR-TDHF/TDDFT calculations.
The procedure for carrying out RT-TDHF/TDDFT calculations with truncated AO basis set for the examples studied in this work is described as follows: • Run 100 (400) steps (1% of the total simulation time) of RT-TDHF/RT-TDDFT simulation with the timestep 0.2 (0.05) atomic units using a preliminarily chosen basis set fχ μ g, and collect the information regarding S, P AO (t), C(t) of every step. • Calculate x DC μ and x IP μ via Eqs. (13) and (14), respectively, and select truncated AO basis set fχ μ g trunc based on the criteria in Eq. (15).
• Run 10'000 (40'000) steps of RT-TDHF/TDDFT (full) simulation with the same timestep using fχ μ g trunc . Note that a ground state SCF calculation should be carried out with the truncated basis set before the RT-TDHF/RT-TDDFT simulation in order to apply the perturbation to a converged ground state.
It is worth noting that the truncation procedure by construction eliminates the transitions that are not (or very weakly) electric dipole allowed, and thus this approach focuses more on the overall spectrum rather than the types of transitions.

Complete basis set limit
In addition to the analyses of truncated AO basis functions, we introduce an algorithm to construct basis sets towards the CBS limit for RT(LR)-TDHF/TDDFT calculations.
The idea of CBS limit employed here is to add diffuse functions (see examples (S)-methyloxirane and (-)-α-pinene for the reason) to all types of AO basis functions (s, p, d, f, g, ...) representing different orbital angular momenta l. These functions are added in an eventempered manner 26,27 by a geometric progression of the orbital exponents in the original basis set: α l,k = α l β k l , 8k 2 N. α l,k is an exponent of the l-shell with kth power, and α l and β k l are two parameters to be optimized for the basis set. Since most basis sets available (Pople 28 , Dunning 29 , Jensen 30 , Ahlrichs 31,32 , etc.) provide more than one exponent for each type of shell, we can directly extrapolate from these values to get the additional exponent α l,k + 1 = α 2 l,k =α l,kÀ1 . One can increase the k value until significant linear dependencies are found in the basis set (sometimes also referred to as basis set overcompleteness 33 ).
This CBS scheme usually requires a quite large basis set for the calculation, and it is usually unclear which basis function(s) should be removed once overcompleteness is reached. Therefore, we combine it with AO truncation and propose an "Add-While-Truncate" algorithm (see Algo. 1) to construct the CBS specifically designed for RT(LR)-TDHF/TDDFT calculations. Firstly, a preliminarily chosen basis set fχ μ g is used for a short period of RT-TDHF/TDDFT calculation and fχ μ g trunc is selected. An additional basis set containing diffuse functions (fχ μ g diffuse ) is constructed in an even-tempered manner. fχ μ g diffuse may contain some basis functions truncated in previous steps (combined as fχ μ g deleted ), which should be removed. Then fχ μ g diffuse is combined with fχ μ g trunc to form a new basis set fχ μ g. In order to check the overcompleteness of the newly created AO basis set, we calculate the overlap matrix S fχ μ g and solve for its eigenvalues λ. If the minimal absolute eigenvalue |λ| min is smaller than a user-defined small value ϵ or fχ μ g remains the same as in the last cycle (namely, basis functions are neither truncated nor added), fχ μ g is regarded as the CBS under such ϵcondition (fχ μ g CBSÀϵ ), otherwise the new fχ μ g is used to repeat the previous steps until the final condition is fulfilled. In practice, one can also manually remove some newly added diffuse functions within fχ μ g in the iteration to satisfy the given ϵ-condition. In this case, in order to minimize the total number of basis functions, we first remove the diffuse functions corresponding to higher orbital angular momentum, which is the same idea as the one applied in calendar basis sets 34 . For the sake of simplicity, we use the term "basis functions" for "AO basis functions" in the remaining part of this manuscript. Algorithm 1. Add-While-Truncate CBS Algorithm 1: repeat 2: fχ μ g old fχ μ g 3: Run a RT-TDHF/TDDFT simulation with fχ μ g for 100 (400) steps. 4: Construct fχ μ g trunc by Eqs. (13)-(15) 5: Construct additional even-tempered basis set fχ μ g diffuse for fχ μ g trunc 6: fχ μ g deleted fχ μ g deleted ∪ ðfχ μ g n fχ μ g trunc Þ 7: fχ μ g diffuse fχ μ g diffuse n fχ μ g deleted 8: fχ μ g fχ μ g trunc ∪ fχ μ g diffuse 9: Solve for eigenvalues λ of overlap matrix S fχ μ g 10: until |λ| min < ϵ or fχ μ g = fχ μ g old 11: fχ μ g CBSÀϵ fχ μ g trunc

Example: H 2 dimer
The H 2 dimer is used as the first test system, with the δ-pulse applied along z direction (see the geometry in Fig. 2d, z axis is parallel to the H-H bond, and y axis is perpendicular to the plane formed by the four atoms). Four different basis sets, 6-31G, 6-31G**, 6-31++G, and 6-31+ +G** 28,35,36 , are utilized for RT-TDHF calculations.
For each H atom, 6-31G contains two s-type basis functions (noted as 2s for convenience), 6-31G** contains 2s1p (1p as extra polarization function, we use italic form to represent specific basis function(s), e.g., 1p means the first p-type basis function), 6-31++G contains 3s (3s as an extra diffuse function), and 6-31++G** contains 3s1p. Note that the abbreviations we use here refer to basis functions but not specific  for 6-31G** and 6-31++G**: 2p x and 2p y of each H atom, which belong to the polarization functions. If x thr is set to 0.2 (gray dashed lines in Fig. 2b, d), extra four basis functions are to be deleted in both cases: 2p z of each H atom. Therefore, a setting of x thr = 0.2 essentially truncates 6-31G** to 6-31G and 6-31++G** to 6-31++G for the H 2 dimer. Note that the pulse causes differences between atoms with different nuclear Cartesian coordinates, leading to different x DC -x IP values of the same basis functions in different H atoms. In Supplementary Fig. 1, we further provide a more intuitive view of O μ from each basis function in 6-31++G and 6-31++G** basis sets. One can easily distinguish small contribution components from the large contribution ones.
Let us take a closer look at the x DC -x IP map in the case of 6-31++G** basis set (see basis functions shown in Fig. 2d, we focus only on one H atom here). Actually, we can explain qualitatively that 2p x and 2p y are the least important basis functions for the simulation of the electronic absorption spectrum. The electric dipole transitions from σ 1s−1s to π 2p x À2p x , π * 2p x À2p x , π 2p y À2p y , and π * 2p y À2p y are almost (considering the effect from the other H 2 molecule close-by) forbidden due to symmetry reasons. The x DC -x IP map shows that 2p x is slightly more important than 2p y , which may be explained by a stronger interaction on the x direction between H 2 molecules. The electric dipole transition π * 2p z À2p z σ 1sÀ1s is allowed, and thus the 2p z basis function is considered to be more important than 2p x and 2p y for the electronic absorption spectrum. The electric dipole transition σ * 2sÀ2s σ 1sÀ1s is also allowed and σ * 2sÀ2s has an lower energy than π * 2p z À2p z , which leads to a higher occupation probability. Therefore, 2s in 6-31++G** basis set is one of the "dominant" basis functions in the RTP for H 2 with the computational settings used.
In more complex systems, such an energetic analysis in terms of "static" wavefunctions (e.g., wavefunctions after SCF) is not enough to give a reasonable truncated basis set since MO coefficients C(t) are time-dependent, which explains our choice of using first 100 (400) steps for the analysis.
Besides, a Jaccard index 37 ( J(x thr )) is applied to analyze the similarity of deleted basis functions suggested by our two criteria x DC and x IP (see Eq. (16)). A high Jaccard index indicates more basis functions in common between two sets and vice versa. This information provides an intuitive view of the truncation along x thr for a given basis set.
For H 2 dimer system, it is found that J(x thr ) of 6-31G** and 6-31++G** remains at a value of 1.0 from x thr = 0.01 to x thr = 1.0 (see Fig. 2e). This shows that RTP has clear "preference" for some basis functions within the given basis sets. In the case of 6-31G and 6-31++G basis sets, on the other hand, J(x thr ) remains at a value of 0.0 up to x thr = 0.9, which means that no redundant basis functions are found for such basis sets.
The spectra using different basis sets and truncated 6-31++G** basis sets are in Fig. 2f. The spectra using 6-31G and 6-31G** basis sets look very similar, which matches the truncation suggestion given in Fig. 2b. The same situation is also found in the spectra using 6-31++G and 6-31++G** basis sets. Truncated 6-31++G** basis set with x thr = 0.2 (noted as 6-31++G** trunc 12, 12 basis functions left) leads to the same basis set as 6-31++G, with an error of~0.1 eV of corresponding excitation energies. With a tighter threshold x thr = 0.1, truncated 6-31++G** basis set (noted as 6-31++G** trunc 16, 16 basis functions left) achieves more accurate spectra compared to 6-31++G basis set, with an error at the level of~0.01 eV. These results are in accordance with the x DC -x IP map introduced before. For the sake of completeness, the cases of δpulse from x or y direction are included in Supplementary Fig. 2.

Example: H 2 O dimer
Four different basis sets, def2-TZVP, def2-TZVPP, def2-TZVPD, and def2-TZVPPD 32,38 are utilized for RT-TDHF calculations of H 2 O dimer system (see Fig. 3g for the nuclear structure). Again, these four basis sets are chosen based on the addition of polarization functions and/or diffuse functions. The x DC -x IP map of this system is shown in Fig. 3a-d after 100 steps of RT-TDHF calculations. Compared to H 2 dimer, H 2 O dimer is a more complicated system and thus the x DC -x IP map is more involved. Nevertheless, it is clear that the distribution of basis functions in all plots shows a "dumbbell" shape (from left below to right top), namely, more dispersed in low and high x DC /x IP region compared to the middle range. This provides a rough idea of the range of truncation, and the cross point of x thr = 0.1 generally locates at the neck of the "dumbbell". The Jaccard indices for the four basis sets are shown in Fig. 3e. In Supplementary Figure 3, some visualizations of orbitals are shown in the x DC -x IP map of the def2-TZVP basis set.
A natural question of basis set truncation is whether one needs to do it recursively until the basis set does not change anymore (which we refer to as "recursive truncation"). Therefore, we test the recursive truncation of def2-TZVPD and def2-TZVPPD basis sets, and provide the number of basis functions together with Jaccard indices (see Fig. 3f). One-time truncation decreases the number of basis functions from 148 to 103 and from 116 to 92 for the two original basis sets, while recursive truncation only decreases the number further from 103 to 98 and from 92 to 91, respectively. It is worth mentioning that the two basis sets are very similar as one can see from the number of basis functions after the recursive truncation. The Jaccard indices give visual evidence that "def2-TZVPD trunc 91" and "def2-TZVPPD trunc 98" do not change after another truncation process since J(x) = 0 for x ∈ [0, 0.1]. It also shows that one-time truncation is good enough to significantly decrease J(x thr ) value (see two dashed lines). Considering the time and computational resources spent on recursive truncation process (usually needs several rounds of RT-TDHF/TDDFT calculation), we only focus on one-time truncation in the following.
The spectra of H 2 O dimer using def2-TZVP, def2-TZVPP, def2-TZVPD, and def2-TZVPPD basis sets give a similar conclusion as in H 2 dimer case, namely, that extra diffuse functions have large impact on the absorption spectra while extra polarization functions have limited impact on the absorption spectra (see Supplementary Figure 4). In addition, we provide the spectra after a one-time truncation and recursive truncation processes (see Fig. 3g). All spectra in this figure are very close to each other up to an excitation energy of 20 eV, indicating def2-TZVPPD includes many redundant basis functions for RT-TDHF calculations in this case. For the usage of computational resources, in the case of OðN 4 Þ scaling (HF Coulomb and exchange matrices calculated with 2-electron integrals, without real-space griding or density-fitting), a RT-TDHF run with "def2-TZVPD trunc 91" basis set only consumes (91/148) 4 = 14 % of the time compared to the original def2-TZVPPD basis set. For this system, a special interest is the effect of hydrogen bonds on the truncation process. However, we do not find any dependence of deleted basis functions on the distance (up to 10 Å) between two water molecules, and the suggested truncated basis sets are very similar. This may indicate that the basis functions needed for the description of hydrogen bonds are also important for the electronic absorption spectrum of the water monomers themselves.
Moreover, the CBS scheme is tested for the H 2 O dimer system. Two basis sets, def2-TZVPPD and def2-QZVPPD are used as the starting point for the CBS scheme, with ϵ = 10 −6 (CBS-10 −6 ). We directly modify the basis set file every time when truncating or adding basis functions. The detailed steps of the CBS scheme for def2-TZVPPD basis set are shown in Table 1. The original def2-TZVPPD basis set of the H atom and the O atom is 3s3p1d and 6s4p3d1f, respectively, with 148 basis functions for the H 2 O dimer system. After the first RT-TDHF run, the first d-subshell (1d) of H and the first d-subshell and f-subshell of O (1d1f) are truncated (shown in the bracket). The diffuse functions are then added to the remaining subshells (shown in the bracket), resulting in 4s4p for H and 7s5p3d for O. This is followed by a second RT-TDHF run with truncating and adding basis function. With a basis set of 5s4p for H and 8s6p4d for O, we find |λ| min is smaller than the threshold we set (10 −6 ), thus the newly added subshell with the highest angular moment is removed, say, 4p in H and 4d in O. Finally, we obtain a basis set of 5s3p for H and 8s6p3d for O, with 138 basis functions and |λ| min = 3.7 × 10 −6 . In Supplementary Table 1, we do it analogously for def2-QZVPPD basis set.
RT-TDHF calculations are carried out with these two CBSs, and the resulting spectra are shown in Fig. 3g. Choosing either the def2-TZVPPD or the def2-QZVPPD basis set (blue and red solid lines) can result in differences of absorption peaks with an excitation energy larger than 12 eV, while their corresponding CBSs-10 −6 (blue and red dashed lines) match until 15 eV. Also, CBSs-10 −6 leads to a red shift of 0.2 eV compared to the two original basis sets, which usually indicates the behavior of a larger basis set according to the observations in this work, while in our case this is achieved by less basis functions. In addition, we utilize combined basis sets of original basis set and its CBS-10 −6 (noted as CBS-all) for the same calculation, and we can see that each spectrum (dotted lines) also agrees with the corresponding CBS-10 −6 one.

Examples: (S)-methyloxirane and (-)-α-pinene
As other examples, (S)-methyloxirane and (-)-α-pinene molecules are tested with truncated basis sets. The def2-TZVPP 32,38 basis set is adopted as a reference basis set and the B3LYP functional is selected as the exchange-correlation functional for these two systems. x thr = 0.1 and x thr = 0.2 are used as the truncation threshold. Supplementary Tables 2 and 3 give information about the original and truncated basis sets for (S)-methyloxirane and (-)-α-pinene, respectively. The corresponding x DC -x IP maps are in Supplementary Figs. 5 and 6.
The resulting absorption spectra are shown in Fig. 4a and Supplementary Fig. 8a. The truncated basis sets, both x thr = 0.1 and x thr = Furthermore, we use the same basis sets for ECD spectra calculations, considering that the two quantities x DC and x IP do not explicitly depend on the electric dipole operator. The ECD spectra of (S)-methyloxirane and (-)-α-pinene are shown in Fig. 4b and Supplementary Fig. 8b, respectively. Table 2 gives the benchmark of basis sets used for (S)-methyloxirane and (-)-α-pinene. RT-TDDFT calculations of these two systems are carried out using CP2K. Because Coulomb and exchange and correlation (XC) terms are evaluated on grids, we do not observe a significant time-saving using the proposed truncated basis sets. Nevertheless, computational resources can be reduced as much as one order of magnitude in LR-TDDFT calculations (Gaussian09 39 ) using truncated basis sets. The corresponding memory usage of (-)-α-pinene system is also shown in the table. The memory cost of Coulomb and exchange matrices scale as OðN 4 Þ (2-electron integrals, without real-space griding or density-fitting), and one may easily encounter a memory bottleneck with large basis sets (e.g., using def2-TZVPP for (-)-α-pinene, maximal memory set to 200 GB), which, however, can be alleviated with truncated basis sets. In Supplementary Tables 4 and 5, we show the scaling information of (-)-α-pinene using HF/def2-TZVPP and its truncated basis set, and computational time to calculate Coulomb and exchange matrices, respectively. To assess the contribution from HF exchange term, we further show the difference between B3LYP/def2-TZVPP and BLYP/def2-TZVPP in the calculation of RT-TDDFT spectrum in Supplementary Fig. 7.

Example: ZnPc
ZnPc is a popular example for excited-state calculations [40][41][42][43][44][45] . This example is mainly utilized to demonstrate a step-by-step truncation from 6-31G(d,p) to 6-31G. Here we use the nuclear geometry of ZnPc from a previous study 45 with the B3LYP functional and 6-31G(d,p) 28,35,36 as the reference basis set. Figure 4c provides the information of deleted basis functions and Jaccard indices. The numbers close to the dashed lines are x thr values, and the notations on the right side are details of deleted basis functions in italic form. Note that the number of deleted basis functions (|fχ μ g deleted |) can be higher than the number calculated from the subshell notations on the right. This is because extra basis functions might also be deleted but not the corresponding full subshells, e.g., Zn 1f x thr =0.04 corresponds to a |fχ μ g deleted | value larger than 7 because some other basis functions like H 1p z (but not the full 1p) are deleted. In addition, this planar system, which we place in x − y plane in the simulation, shows some preferences for 1d xy and 1d x 2 Ày 2 of C/N elements, and thus 1d yz , 1d xz , and 1d z 2 basis functions are the first to be deleted in the range of x thr = 0.09~0.18, indicating that the truncation scheme can provide the information of preference on the orientation of basis functions (or AOs with different magnetic quantum number). It is worth mentioning that x thr = 0.09 and x thr = 0.18 truncation leads basically to the 6-31G(d) and 6-31G basis sets, except for the additional truncation on the Zn atom. Also, it is found that |fχ μ g deleted | and J(x thr ) show a very similar trend. After x thr = 0.18, both lines reach a plateau where seldom further basis functions can be removed, indicating 6-31G as a good truncated basis set. Actually, we can see this from x DC -x IP map of the same system (see Supplementary  Fig. 9) in which the truncated and remaining basis functions almost form two blocks with x thr = 0.18 (dashed blue line).
The corresponding RT-TDDFT and LR-TDDFT (1000 roots) spectra are given in Fig. 4d. It is clear that 6-31G(d,p), 6-31G(d), and 6-31G basis sets all provide similar results, which match our truncation suggestions. This shows a practical usage of our truncation scheme on the selection of basis set. In addition, the CBS-10 −6 with 6-31G(d,p) reference is constructed (see Supplementary Table 6 for the CBS process). Nevertheless, it does not change much in the RT-TDDFT/LR-TDDFT spectra compared to the original 6-31G(d,p) basis set.

Example: Ag 20
Ag 20 is a metal cluster with tetrahedral structure (T d symmetry), which has been investigated with TDDFT calculations [46][47][48] . Here we use the nuclear geometry of Ag 20 from a previous study 48 together with PBE0 49 functional and GTH 50,51 Gaussian-type pseudopotential basis sets 52 GTH-DZVP, GTH-TZVP, and GTH-TZV2P. GTH-TZV2P is used as the reference basis set for the truncation process. The Ag atoms in Ag 20 cluster are categorized into 3 groups: vertex (v), edge (e), and face (f) (see Fig. 4e). The atoms in the same group are equivalent in space and should have the same contribution to the electronic absorption spectrum. Table 3 shows the truncated basis functions versus increasing x thr values. As one can see from the table, atoms in different groups generally have different suggested basis set truncations. More basis functions are truncated for atoms at vertex position, and less for atoms at face position. This is reasonable because vertex Ag atoms have a limited space angle "bonded" with other atoms, while face Ag atoms have half of their surrounding space occupied with 9 nearest neighbors, and complex surroundings often require more basis functions to describe the interactions. For comparison, basis set information of GTH-TZVP and GTH-DZVP is also listed. The truncation scheme provides quite different basis sets from GTH-TZVP and GTH-DZVP basis set, e.g., GTH-TZVP can be regarded as −2f truncated basis set of GTH-TZV2P, but 2f basis functions are the last choice of truncation from our scheme (up to x thr = 0.6). This means that the corresponding standardly available smaller basis sets, e.g., GTH-DZVP or GTH-TZVP, do not always contain the most important basis functions (for TDDFT calculations) from the larger ones, e.g., GTH-TZV2P, which is different from what we have found for the ZnPc example system. We select x thr = 0.3 and x thr = 0.5 truncated basis sets for LR-TDDFT calculations, under the consideration that the numbers of basis functions are close to GTH-TZVP and GTH-DZVP basis sets, respectively. The truncation threshold used in the CBS scheme is x thr = 0.1. Basis set information of H and O atoms is noted as non-italic form, and the added/truncated/removed basis functions are in italic form in the brakets. "Run" refers to a short time RT-TDHF calculation with 100 steps. In the last column, the total number of basis functions of the system is listed.
The LR-TDDFT spectra (2000 roots) of the Ag 20 cluster using 5 different basis sets are shown in Fig. 4f. In general, "GTH-TZV2P trunc 684" (x thr = 0.3) gives better agreement with the reference GTH-TZV2P basis set than the one with the GTH-TZVP basis set. This can be seen as follows: 1. for the first several peaks (at~2.6 eV, 2.9 eV, and 3.4 eV), red dashed peaks all locate closer to black peaks than yellow dashed peaks; 2. for the peaks up to 7 eV, the red dashed line follows closer to the black line than the yellow dashed line. However, the difference between the spectra calculated using "GTH-TZV2P trunc 512" (x thr = 0.5) and GTH-DZVP basis sets is limited, which can be explained by their similar composition in terms of basis functions shown in Table 3. The Ag 20 example demonstrates that the proposed truncation scheme has the ability of assigning different basis sets to the atoms, according to their "interaction" with the full system.
Additionally, we also provide some testing calculations to demonstrate: 1. 1% of the total propagation time is sufficient to show the contribution from each basis function, and the same truncation suggestion has been obtained using 1%, 10%, and 100% of the RTP steps (see Supplementary Fig. 10) 2. Indicator x IP is necessary in the truncation scheme, and the truncation using only indicator x DC can lead to a different spectrum (see Supplementary Fig. 11) 3. ERIs Schwarz screening does not affect the truncation scheme, and they can be used together for the acceleration in RT-TDHF/TDDFT calculations (see Supplementary Figs. Tables 7-9).

Discussion
We have introduced an AO basis set truncation scheme for TDDFT calculations, based on the analysis of a short period of real time propagation of MO coefficients. Two quantitiesdensity matrix contribution and importance of propagation stabilityare constructed as indicators for the truncation process. The truncated basis sets are found to reproduce the electronic absorption spectra obtained with the original basis sets well. In some cases, truncated basis sets can serve as intermediate basis sets between two levels of available basis sets, or are found to be very close to lower level basis sets available, in which the truncation process works as a means to help in basis set selection. Two intuitive graphs, x DC -x IP map and Jaccard index, are introduced for the analysis of basis functions. These graphs also provide a guide for the choice of the truncation threshold x thr (e.g., see diagrams in Fig. 5).
As opposed to basis sets constructed mainly for the purpose of energy minimization and geometry optimization, the truncation scheme proposed provides a task-, system-, and chemical environment-specific basis set. It has reduced number of basis functions and accelerates the calculations involving the construction of Coulomb and/or exchange matrices iteratively in every propagation step, potentially with the scaling of OðN 4 Þ. ERIs usually benefit from evaluating all components from the same shell. However, they are only computed once before the propagation. Because the truncation process is carried out on the original AO basis set without any rotation or reconstruction of basis functions, the truncated basis sets can be easily employed in any quantum chemistry package using Gaussian-type (or Slater-type) basis, with a simple modification of the basis set file. Additionally, we have tested recursive truncation to show that the process is robust and will result in a "truncation consistent" basis set given a certain x thr . Though the truncation is based on the analysis of real-time propagation, the basis sets produced can also been used for LR-TDDFT calculations and provide equally good spectra. Nevertheless, the acceleration of LR-TDDFT calculations depends on the the systems and purposes of the research, e.g., limited number of excitations in LR-TDDFT for a small system may not be worth an additional RT-TDDFT calculation to determine a truncated basis set, while a highly conjugated or a large system with excitations of higher energies should benefit from the truncation scheme. How the truncation scheme and truncated basis set might be transferable to more accurate yet expensive methods like GW/Bethe-Salpeter equation [53][54][55] , time-dependent coupled cluster/configurational interaction [56][57][58] , or other type of any excited-state calculations might be explored in the future.
Furthermore, an "Add-While-Truncate" algorithm has been proposed to construct basis sets towards the complete basis set limit. The additional basis functions are added as diffuse functions in an even-tempered manner, and no extra polarization functions are added. The neglect of polarization functions is primarily based on truncation experiences we have got from this study (e.g., in H 2 O dimer, (S)-methyloxirane, and (-)-α-pinene systems). There are some discussions about the use of polarization and diffuse functions used for electric dipole moment, polarizability, and TDDFT calculations in previous works [59][60][61][62][63] . Nevertheless, as shown in test examples, the truncation process provides the possibility to select polarization and diffuse functions quantitatively. The proposed CBS scheme can construct basis sets to arbitrary accuracy, depending on a predefined parameter limited by the linear dependency between basis functions.
The truncation scheme might reveal some intrinsic knowledge for the better description of electronic excitations between ground state and excited states, and offer a thought for the design of basis sets in TDDFT calculations. Future work can be on both basis set constructions and migration to other excited-state calculations or properties. In this work, all original basis sets employed are ground state energy-optimized, however, there is another group of completeness-optimized basis sets [64][65][66] , with which one may also test the efficiency and validate the accuracy towards CBS-limit 65 . Auxiliary density matrix methods 67 provide an alternative way to accelerate HF exchange calculation via auxiliary basis set, and have been found to yield highly accurate results in energies and response properties 63 . Considering the computational demanding HF exchange calculation employed in hybrid functionals, it is possible to further assess the truncation scheme for auxiliary basis sets. In addition, the idea of decomposing the electric dipole contributions into the contribution from individual basis functions can be migrated to other properties and produce different task specific basis sets. One may be interested in a truncated basis for dynamic calculations, Fig. 4 | RT-TDDFT calculations of (S)-methyloxirane, ZnPc, and the Ag 20 structures. (S)-methyloxirane a electronic absorption spectra and b electronic circular dichroism spectra using original and truncated basis sets of def2-TZVPP (stick spectra below correspond to LR-TDDFT ones). c Number of deleted basis functions (|fχ μ g deleted |) and corresponding Jaccard indices of 6-31G(d,p) basis set for ZnPc system. Purple, red, green, and blue dashed lines represent different x thr values, and the corresponding colored notations give the deleted basis functions in italic form. Note that the red dashed line with x thr = 0.09 and blue dashed line with x thr = 0.18 essentially reduce 6-31G(d,p) to 6-31G(d) and 6-31G basis set, respectively (except for the addition truncation on the Zn atom). d RT-TDDFT/LR-TDDFT electronic absorption spectra using 6-31G(d,p), 6-31G(d), 6-31G, and 6-31G(d,p) CBS-10 −6 basis sets for ZnPc system (stick spectra below correspond to LR-TDDFT ones). e Ag 20 geometry with colored vertex atoms (top), colored edge atoms (middle), and colored face atoms (bottom). f Ag 20 LR-TDDFT electronic absorption spectra using GTH-TZV2P, GTH-TZVP, GTH-DZVP, and truncated GTH-TZV2P basis sets. Below are the stick spectra and above are the Gaussian broadened spectra with FWHM = 0.12 eV. The number after the basis set label indicates the number of basis functions. a.u. arbitrary units. The truncation threshold used are xthr = 0.1 and x thr = 0.2. The RT-TDDFT time is noted as time per step (40'000 steps in total) and LR-TDDFT time is noted as full computational time. The memory usage of LR-TDDFT calculations of (-)-α-pinene system is also presented in square brackets. LR-TDDFT calculations are processed with 8 CPU cores@3.3 GHz, and RT-TDDFT calculations are processed with 2 × 12 CPU cores@2.6 GHz and 32 × 12 CPU cores@2.6 GHz for (S)-methyloxirane and (-)-α-pinene, respectively. s second, h hour, GB gigabyte which, however, might require further investigations on the consistency in the truncation for each nuclear configuration. Apart from basis set truncation, a direct basis set optimization algorithm (e.g., on exponents of Gaussian-type basis functions) is also possible given a proper loss function based on x DC and x IP parameters. While we have only tested the truncation process on neutral molecules in this study, charged systems could also be investigated. This would be an interesting topic since it may demonstrate the dependence of necessary basis functions on different charges for excited state calculations (e.g., effect of diffuse function on anions which is known for ground state cases). In summary, our basis set truncation scheme provides a robust process for decreasing the number of basis functions and speeding up TDDFT calculations, while preserving the high accuracy of the spectra. The quantitative basis set analysis allows a profound understanding of the basis functions employed and opens up a broad area for potential research in excited state calculations.
A δ-pulse is chosen as the electric field perturbation to excite the molecules in RT-TDHF/TDDFT calculations. The application of the δ-pulse can be thought of as being applied instantly to the converged ground states MOs |ϕ 0 i between a time t = 0 − and t = 0 + . It corresponds to an impulse 75 where ℏ is reduced Planck constant. The vector κ ! indicates the direction and amplitude of the perturbation. The propagation is then started from the perturbed MOs |ψ δ i i.  Due to the T d symmetry of the system, we can categorize Ag atoms into 3 groups: vertex (4 atoms), edge (12 atoms), and face (4 atoms), with the notation "v", "e", and "f", respectively. Different truncation thresholds x thr are employed and GTH-TZVP and GTH-DZVP basis set information are also listed for comparison. Truncated basis functions are in italic form. Red/gray denote basis functions that can/cannot be removed. From left to right: a high correlation between x DC and x IP , basis set can be truncated at any accuracy (e.g., H 2 dimer 6-31G**, pulse z direction); b high correlation between x DC and x IP until certain s (where s is boundary of high and low correlation regions), basis set can be truncated at accuracy with x thr ≤ s (e.g., H 2 dimer 6-31++G**, pulse z direction); c two blocks with low correlation regions, basis set is recommended to be truncated at connected part between two blocks (e.g., all other test examples), which is also the most common case; d fully low correlation region, basis set can hardly be truncated (e.g., H 2 dimer 6-31G/6-31++G, pulse z direction).

Data availability
The data generated in this study have been deposited at https://gitlab. uzh.ch/lubergroup/ao-truncation. Source data are provided in this paper.